nbinom (Negative binomial distribution)#
The negative binomial distribution models a count (discrete) outcome. In the scipy.stats.nbinom parameterization, it is:
the number of failures (K) observed before achieving
nsuccesses, when each trial succeeds with probabilityp.
Two common interpretations depending on n:
If
nis a positive integer, this is literally a “repeat Bernoulli trials untilnsuccesses” model.If
nis a positive real (as allowed by SciPy),nis best thought of as a dispersion/shape parameter via the Gamma–Poisson mixture view.
This notebook uses the same parameterization as scipy.stats.nbinom:
n> 0 (integer in the classic counting story; real-valued in many statistical models)p∈ (0, 1] (success probability)
Learning goals#
By the end you should be able to:
write the PMF/CDF and map between common parameterizations
derive mean/variance and a usable log-likelihood
sample using a NumPy-only algorithm (Gamma–Poisson mixture)
visualize PMF/CDF and validate by Monte Carlo
use
scipy.stats.nbinomforpmf,cdf,rvs, and fitting
import math
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(42)
1) Title & Classification#
Name: nbinom (Negative binomial distribution)
Type: Discrete
Support: [ k \in {0, 1, 2, \dots} ]
Parameter space (SciPy): [ n > 0,\qquad 0 < p \le 1 ]
Interpretation (classic Bernoulli-trial story):
nis the number of successes we wait forpis the success probability per trialthe random variable (K) counts failures before the
n-th success
2) Intuition & Motivation#
What this distribution models#
The negative binomial answers questions like:
“How many misses happen before we see
nhits?”“How many non-events occur before the
n-th event?”
If n=1, you get the geometric distribution (failures before the first success).
Typical real-world use cases#
The negative binomial is a workhorse for count data when the Poisson model is too “tight”. Common examples:
number of defects before a fixed number of passes
number of insurance claims, incidents, arrivals in a time window
word counts in documents (topic models / language)
RNA-seq / gene expression counts (overdispersed relative to Poisson)
Relations to other distributions#
Two especially important relationships:
Sum of geometrics (integer
n) [ K = G_1 + \cdots + G_n,\qquad G_i\sim ext{Geom}(p) ext{ on }{0,1,2,\dots} ]Gamma–Poisson mixture (any real
n>0) [ \Lambda \sim ext{Gamma}( ext{shape}=n,\ ext{scale}= frac{1-p}{p}),\qquad K\mid\Lambda\sim ext{Poisson}(\Lambda) ] Then marginally (K\sim ext{NB}(n,p)). This view explains why negative binomial is used for overdispersed counts.
A convenient re-parameterization in terms of the mean (\mu) and dispersion (n) is:
[
\mu = \mathbb{E}[K],\qquad p = rac{n}{n+\mu},\qquad \mathrm{Var}(K)=\mu+rac{\mu^2}{n}
]
So smaller n means more overdispersion.
3) Formal Definition#
Let (K\sim ext{NB}(n,p)) denote the number of failures before the n-th success.
PMF#
For integers (k\ge 0): [ \mathbb{P}(K=k) = inom{k+n-1}{k}(1-p)^k p^n ]
A numerically convenient equivalent form (valid for real (n>0)) uses gamma functions: [ inom{k+n-1}{k} = rac{\Gamma(k+n)}{\Gamma(n),\Gamma(k+1)} ] so [ \mathbb{P}(K=k) = rac{\Gamma(k+n)}{\Gamma(n),\Gamma(k+1)},(1-p)^k,p^n. ]
CDF#
For real (x), define (\lfloor x floor) as the floor. Then [ F(x)=\mathbb{P}(K\le x)=\sum_{k=0}^{\lfloor x floor}\mathbb{P}(K=k). ]
A standard special-function identity is [ \mathbb{P}(K\le k) = I_{p}(n,\ k+1),\qquad k\in{0,1,2,\dots} ] where (I) is the regularized incomplete beta function.
4) Moments & Properties#
Let (q = 1-p).
Mean and variance#
[ \mathbb{E}[K] = rac{nq}{p},\qquad \mathrm{Var}(K)=rac{nq}{p^2}. ]
Equivalently, if you use the mean (\mu): (\mu= frac{nq}{p}), then [ \mathrm{Var}(K)=\mu+rac{\mu^2}{n}. ]
Skewness and kurtosis#
[ ext{skew}(K)=rac{2-p}{\sqrt{n(1-p)}} ]
The excess kurtosis is [ ext{excess kurt}(K)=rac{6 + rac{p^2}{1-p}}{n} ] (so the full kurtosis is (3+ ext{excess kurt})).
MGF and characteristic function#
For (t < -\log(1-p)): [ M_K(t)=\mathbb{E}[e^{tK}] = \left(rac{p}{1-(1-p)e^t} ight)^{n}. ]
For real (t): [ arphi_K(t)=\mathbb{E}[e^{itK}] = \left(rac{p}{1-(1-p)e^{it}} ight)^{n}. ]
Entropy#
The (Shannon) entropy is [ H(K) = -\sum_{k=0}^{\infty} \mathbb{P}(K=k),\log \mathbb{P}(K=k). ] There is no simple elementary closed form; in practice you compute it numerically (e.g., via truncation or via SciPy).
Other useful properties#
Mode: for (n>1), (\left\lfloor frac{(n-1)(1-p)}{p} ight floor). For (0<n\le 1), the mode is 0.
Additivity (same
p): if (K_1\sim ext{NB}(n_1,p)) and (K_2\sim ext{NB}(n_2,p)) independent, then (K_1+K_2\sim ext{NB}(n_1+n_2,p)).Poisson limit: with mean (\mu) fixed and (n o\infty), the distribution approaches ( ext{Poisson}(\mu)).
def _validate_n_p(n, p):
if isinstance(n, bool):
raise TypeError("n must be a positive number")
n_float = float(n)
if not (n_float > 0.0):
raise ValueError("n must be > 0")
p_float = float(p)
if not (0.0 < p_float <= 1.0):
raise ValueError("p must be in (0, 1]")
return n_float, p_float
def nbinom_logpmf(k, n, p):
# Log PMF for NB(n, p) on k=0,1,2,... (SciPy parameterization).
n, p = _validate_n_p(n, p)
k_arr = np.asarray(k)
out = np.full(k_arr.shape, -np.inf, dtype=float)
k_int = k_arr.astype(int)
valid = (k_int == k_arr) & (k_int >= 0)
if not np.any(valid):
return out
if p == 1.0:
out[valid & (k_int == 0)] = 0.0
return out
kv = k_int[valid]
# log Γ(k+n) - log Γ(n) - log(k!)
log_coeff = (
np.vectorize(math.lgamma)(kv + n)
- math.lgamma(n)
- np.vectorize(math.lgamma)(kv + 1)
)
out[valid] = log_coeff + kv * math.log1p(-p) + n * math.log(p)
return out
def nbinom_pmf(k, n, p):
return np.exp(nbinom_logpmf(k, n, p))
def nbinom_mean(n, p):
n, p = _validate_n_p(n, p)
return n * (1 - p) / p
def nbinom_var(n, p):
n, p = _validate_n_p(n, p)
return n * (1 - p) / (p * p)
def nbinom_skew(n, p):
n, p = _validate_n_p(n, p)
q = 1 - p
return (2 - p) / math.sqrt(n * q)
def nbinom_excess_kurt(n, p):
n, p = _validate_n_p(n, p)
q = 1 - p
return (6 + (p * p) / q) / n
def nbinom_pmf_cdf_trunc(n, p, *, q=0.999, max_k=200_000):
# Return (ks, pmf, cdf) for k=0..K capturing ~q mass via recurrence.
n, p = _validate_n_p(n, p)
if not (0.0 < q <= 1.0):
raise ValueError("q must be in (0, 1]")
if p == 1.0:
return np.array([0]), np.array([1.0]), np.array([1.0])
pmf0 = math.exp(n * math.log(p))
pmf_vals = [pmf0]
cdf_vals = [pmf0]
k = 0
while cdf_vals[-1] < q and k < max_k:
k += 1
# f(k)/f(k-1) = ((k-1+n)/k) * (1-p)
pmf_k = pmf_vals[-1] * ((k - 1 + n) / k) * (1 - p)
pmf_vals.append(pmf_k)
cdf_vals.append(cdf_vals[-1] + pmf_k)
if pmf_k == 0.0:
break
ks = np.arange(len(pmf_vals))
pmf = np.array(pmf_vals, dtype=float)
cdf = np.minimum(1.0, np.array(cdf_vals, dtype=float))
cdf[-1] = min(1.0, cdf[-1])
return ks, pmf, cdf
def nbinom_entropy_trunc(n, p, *, q=0.999999, max_k=400_000):
# Approximate entropy in nats via truncation at mass q.
ks, pmf, _ = nbinom_pmf_cdf_trunc(n, p, q=q, max_k=max_k)
# Lower bound (ignores tail beyond the truncation point).
pmf = pmf[pmf > 0]
return float(-(pmf * np.log(pmf)).sum())
n, p = 8, 0.35
moments = {
"mean": nbinom_mean(n, p),
"var": nbinom_var(n, p),
"skew": nbinom_skew(n, p),
"excess_kurt": nbinom_excess_kurt(n, p),
"entropy_trunc_nats": nbinom_entropy_trunc(n, p),
}
moments
{'mean': 14.85714285714286,
'var': 42.44897959183674,
'skew': 0.7235728659282991,
'excess_kurt': 0.7735576923076923,
'entropy_trunc_nats': 3.2473831980948553}
# Monte Carlo check (matches formulas up to sampling error)
samples = rng.negative_binomial(n=n, p=p, size=200_000)
est_mean = samples.mean()
est_var = samples.var(ddof=0)
{
"formula_mean": moments["mean"],
"mc_mean": float(est_mean),
"formula_var": moments["var"],
"mc_var": float(est_var),
"entropy_trunc_nats": moments["entropy_trunc_nats"],
}
{'formula_mean': 14.85714285714286,
'mc_mean': 14.87479,
'formula_var': 42.44897959183674,
'mc_var': 42.5450724559,
'entropy_trunc_nats': 3.2473831980948553}
5) Parameter Interpretation#
In the SciPy parameterization:
p(success probability) controls how quickly successes arrive. Largerpputs more mass near 0 failures.n(shape / number of successes) controls both scale and dispersion. For integern, it is literally the number of successes you wait for.
Two identities are especially useful: [ \mathbb{E}[K]=rac{n(1-p)}{p},\qquad \mathrm{Var}(K)=rac{n(1-p)}{p^2}. ]
Re-parameterization by mean (\mu) and dispersion (n): [ p = rac{n}{n+\mu},\qquad \mathrm{Var}(K)=\mu+rac{\mu^2}{n}. ] So with (\mu) fixed:
small
n⇒ large variance ⇒ heavy right tail (strong overdispersion)large
n⇒ variance close to (\mu) ⇒ close to Poisson
from plotly.subplots import make_subplots
# Left: vary p with fixed n
n_fixed = 10
p_values = [0.2, 0.4, 0.6, 0.8]
# Right: fix mean mu and vary dispersion n
mu_fixed = 20
n_values = [1, 3, 10, 50]
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=(
f"PMF vs p (n={n_fixed})",
f"PMF vs dispersion n (mean μ={mu_fixed})",
),
)
for p_ in p_values:
ks, pmf, _ = nbinom_pmf_cdf_trunc(n_fixed, p_, q=0.995)
fig.add_trace(
go.Scatter(x=ks, y=pmf, mode="markers+lines", name=f"p={p_}"),
row=1,
col=1,
)
for n_ in n_values:
p_ = n_ / (n_ + mu_fixed)
ks, pmf, _ = nbinom_pmf_cdf_trunc(n_, p_, q=0.995)
fig.add_trace(
go.Scatter(
x=ks,
y=pmf,
mode="markers+lines",
name=f"n={n_} (p={p_:.3f})",
),
row=1,
col=2,
)
fig.update_xaxes(title_text="k", row=1, col=1)
fig.update_yaxes(title_text="P(K=k)", row=1, col=1)
fig.update_xaxes(title_text="k", row=1, col=2)
fig.update_yaxes(title_text="P(K=k)", row=1, col=2)
fig.update_layout(height=430, legend_title_text="")
fig.show()
6) Derivations#
6.1 Expectation#
Using the Gamma–Poisson mixture representation: ( \Lambda\sim ext{Gamma}(n,\ ext{scale}= frac{1-p}{p}) ) and (K\mid\Lambda\sim ext{Poisson}(\Lambda)).
By iterated expectation: [ \mathbb{E}[K] = \mathbb{E}[,\mathbb{E}[K\mid\Lambda] ,] = \mathbb{E}[\Lambda] = n,rac{1-p}{p}. ]
6.2 Variance#
By the law of total variance: [ \mathrm{Var}(K) = \mathbb{E}[\mathrm{Var}(K\mid\Lambda)] + \mathrm{Var}(\mathbb{E}[K\mid\Lambda]). ] For a Poisson, (\mathrm{Var}(K\mid\Lambda)=\Lambda) and (\mathbb{E}[K\mid\Lambda]=\Lambda). Hence [ \mathrm{Var}(K)=\mathbb{E}[\Lambda]+\mathrm{Var}(\Lambda). ] For (\Lambda\sim ext{Gamma}(n, ext{scale}= frac{1-p}{p})): ( \mathbb{E}[\Lambda]=n frac{1-p}{p} ) and ( \mathrm{Var}(\Lambda)=n( frac{1-p}{p})^2 ). So [ \mathrm{Var}(K)=rac{n(1-p)}{p}+rac{n(1-p)^2}{p^2}=rac{n(1-p)}{p^2}. ]
6.3 Likelihood#
For i.i.d. observations (k_1,\dots,k_m), the log-likelihood is [ \ell(n,p) = \sum_{i=1}^m\Big[ \log\Gamma(k_i+n)-\log\Gamma(n)-\log\Gamma(k_i+1)
k_i\log(1-p) + n\log p \Big]. ]
If n is known, the MLE for p has a closed form. Differentiate with respect to p and set to zero:
[
\hat p = rac{n}{n+ar k},\qquad ar k = rac{1}{m}\sum_i k_i.
]
Estimating both n and p requires numerical optimization (or the mean/dispersion re-parameterization).
# Visualize the likelihood for p with n fixed
def nbinom_loglik(n, p, data):
return float(nbinom_logpmf(data, n, p).sum())
n_fixed = 8
# Synthetic data (overdispersed counts)
data = rng.negative_binomial(n=n_fixed, p=0.35, size=400)
kbar = data.mean()
p_hat_closed = n_fixed / (n_fixed + kbar)
p_grid = np.linspace(1e-4, 1 - 1e-4, 400)
ll = np.array([nbinom_loglik(n_fixed, p, data) for p in p_grid])
fig = go.Figure()
fig.add_trace(go.Scatter(x=p_grid, y=ll, mode="lines", name="log-likelihood"))
fig.add_vline(x=p_hat_closed, line_dash="dash", annotation_text=f"MLE p≈{p_hat_closed:.3f}")
fig.update_layout(
title=f"Log-likelihood for p (n fixed at {n_fixed})",
xaxis_title="p",
yaxis_title="log L(p)",
width=850,
height=420,
)
fig.show()
{"kbar": float(kbar), "p_hat_closed": float(p_hat_closed)}
{'kbar': 14.935, 'p_hat_closed': 0.34881185960322647}
7) Sampling & Simulation#
NumPy-only sampling via the Gamma–Poisson mixture#
Use the hierarchical model:
Sample (\Lambda\sim ext{Gamma}(n, ext{scale}= frac{1-p}{p}))
Sample (K\mid\Lambda\sim ext{Poisson}(\Lambda))
This produces exact samples from ( ext{NB}(n,p)), works for any real n>0, and uses only NumPy RNGs.
Alternative (integer n): sum of geometrics#
If n is an integer, you can sample n independent geometric “failures before success” variables and sum them.
def sample_nbinom_poisson_gamma(n, p, size=1, *, rng: np.random.Generator):
n, p = _validate_n_p(n, p)
if p == 1.0:
return np.zeros(size, dtype=int)
# Lambda ~ Gamma(shape=n, scale=(1-p)/p)
lam = rng.gamma(shape=n, scale=(1 - p) / p, size=size)
return rng.poisson(lam)
def sample_nbinom_geometric_sum(n, p, size=1, *, rng: np.random.Generator):
n, p = _validate_n_p(n, p)
if not float(n).is_integer():
raise ValueError("geometric-sum sampler requires integer n")
n_int = int(n)
if p == 1.0:
return np.zeros(size, dtype=int)
size_tuple = (size,) if isinstance(size, (int, np.integer)) else tuple(size)
# NumPy's geometric returns the number of trials until first success (>=1).
# Failures before success = geometric - 1.
g = rng.geometric(p, size=size_tuple + (n_int,)) - 1
return g.sum(axis=-1)
n, p = 7.5, 0.35
x = sample_nbinom_poisson_gamma(n, p, size=200_000, rng=rng)
{
"theory_mean": nbinom_mean(n, p),
"mc_mean": float(x.mean()),
"theory_var": nbinom_var(n, p),
"mc_var": float(x.var(ddof=0)),
}
{'theory_mean': 13.928571428571429,
'mc_mean': 13.939795,
'theory_var': 39.79591836734694,
'mc_var': 39.908960357975}
8) Visualization#
We’ll visualize:
the PMF (truncated to cover most probability mass)
the CDF (step function)
Monte Carlo samples vs the theoretical PMF
n, p = 10, 0.3
ks, pmf, cdf = nbinom_pmf_cdf_trunc(n, p, q=0.999)
fig_pmf = go.Figure()
fig_pmf.add_trace(go.Bar(x=ks, y=pmf, name="PMF"))
fig_pmf.update_layout(
title=f"Negative binomial PMF (n={n}, p={p}) — truncated at CDF≈{cdf[-1]:.4f}",
xaxis_title="k (failures)",
yaxis_title="P(K=k)",
)
fig_pmf.show()
fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=ks, y=cdf, mode="lines", line_shape="hv", name="CDF"))
fig_cdf.update_layout(
title=f"Negative binomial CDF (n={n}, p={p})",
xaxis_title="k",
yaxis_title="P(K≤k)",
yaxis=dict(range=[0, 1.02]),
)
fig_cdf.show()
mc = sample_nbinom_poisson_gamma(n, p, size=200_000, rng=rng)
counts = np.bincount(mc, minlength=int(ks[-1]) + 1)[: len(ks)]
pmf_hat = counts / counts.sum()
fig_mc = go.Figure()
fig_mc.add_trace(go.Bar(x=ks, y=pmf_hat, name="Monte Carlo", opacity=0.65))
fig_mc.add_trace(go.Scatter(x=ks, y=pmf, mode="markers+lines", name="Theory"))
fig_mc.update_layout(
title=f"Monte Carlo vs theory (n={len(mc):,} samples)",
xaxis_title="k",
yaxis_title="Probability",
)
fig_mc.show()
9) SciPy Integration#
SciPy provides scipy.stats.nbinom as an rv_discrete distribution.
Common methods:
pmf,logpmfcdf,sf,ppfrvsstats(mean/var/skew/excess kurtosis)entropy
Fitting#
rv_discrete objects typically do not expose a .fit(...) method.
In modern SciPy, you can fit discrete or continuous distributions with scipy.stats.fit.
Note: for modeling overdispersed counts, many libraries use the mean/dispersion parameterization; always check which convention you are using.
import scipy.stats as stats
n, p = 12, 0.4
rv = stats.nbinom(n, p) # frozen distribution (loc=0)
ks = np.arange(0, 8)
print("pmf:", rv.pmf(ks))
print("cdf:", rv.cdf(ks))
s = rv.rvs(size=10, random_state=rng)
print("rvs:", s)
mean_sp, var_sp, skew_sp, kurt_sp = rv.stats(moments="mvsk")
print("mean/var/skew/excess_kurt:", float(mean_sp), float(var_sp), float(skew_sp), float(kurt_sp))
print("entropy (nats):", float(rv.entropy()))
# Compare SciPy's skew/kurt to formulas
print("formula skew:", nbinom_skew(n, p))
print("formula excess kurt:", nbinom_excess_kurt(n, p))
# Fit with scipy.stats.fit (estimate n and p, keep loc fixed)
true_n, true_p = 7.5, 0.35
x = stats.nbinom.rvs(true_n, true_p, size=3000, random_state=rng)
fit_res = stats.fit(
stats.nbinom,
x,
bounds={
"n": (1e-6, 200.0),
"p": (1e-6, 1 - 1e-6),
"loc": (0, 0),
},
)
fit_res
pmf: [0.000017 0.000121 0.000471 0.001319 0.002968 0.005698 0.009687 0.014946]
cdf: [0.000017 0.000138 0.000609 0.001928 0.004896 0.010594 0.020282 0.035228]
rvs: [29 16 17 13 7 17 20 16 18 24]
mean/var/skew/excess_kurt: 17.999999999999996 44.999999999999986 0.5962847939999439 0.5222222222222223
entropy (nats): 3.291640885791076
formula skew: 0.5962847939999439
formula excess kurt: 0.5222222222222223
params: FitParams(n=8.0, p=0.36794014275523457, loc=0.0)
success: True
message: 'Optimization terminated successfully.'
10) Statistical Use Cases#
A) Hypothesis testing: detecting overdispersion vs a Poisson model#
A standard diagnostic for Poisson counts is the dispersion ratio: [ \hat\phi = rac{s^2}{ar y} ] where (ar y) is the sample mean and (s^2) is the sample variance. For a Poisson, (\mathbb{E}[s^2]pprox ar y), so (\hat\phipprox 1). Values significantly larger than 1 indicate overdispersion (a common motivation for a negative binomial model).
B) Bayesian modeling: Gamma–Poisson predictive is negative binomial#
If (Y\mid\lambda\sim ext{Poisson}(\lambda)) and (\lambda\sim ext{Gamma}(lpha, ext{rate}=eta)), then the prior predictive for (Y) is negative binomial. After observing data, the posterior predictive for a new count is also negative binomial.
C) Generative modeling: heterogeneous rates#
If each observation has its own random rate (\Lambda_i) (Gamma-distributed heterogeneity), and counts are Poisson given rates, then the marginal distribution of counts is negative binomial.
from scipy.stats import chi2
# Synthetic example: data generated from a negative binomial (overdispersed)
true_n, true_p = 5.0, 0.25
m = 300
y = stats.nbinom.rvs(true_n, true_p, size=m, random_state=rng)
ybar = y.mean()
s2 = y.var(ddof=1)
# Dispersion test statistic (approximate): (m-1)*s^2 / ybar ~ Chi^2_{m-1} under Poisson
D = (m - 1) * s2 / ybar
p_value_over = 1 - chi2.cdf(D, df=m - 1)
{
"sample_mean": float(ybar),
"sample_var": float(s2),
"dispersion_ratio_var_over_mean": float(s2 / ybar),
"D": float(D),
"p_value_overdispersion": float(p_value_over),
}
{'sample_mean': 14.116666666666667,
'sample_var': 61.39437012263098,
'dispersion_ratio_var_over_mean': 4.349069902429585,
'D': 1300.371900826446,
'p_value_overdispersion': 0.0}
# Bayesian modeling: Gamma-Poisson posterior predictive
# Prior: lambda ~ Gamma(alpha, rate=beta)
alpha, beta = 2.0, 1.0
# Observe Poisson counts (synthetic)
lambda_true = 3.0
obs = stats.poisson.rvs(lambda_true, size=50, random_state=rng)
alpha_post = alpha + obs.sum()
beta_post = beta + len(obs)
# Posterior predictive for a new count:
# If lambda|data ~ Gamma(alpha_post, rate=beta_post), then
# Y_new ~ NB(n=alpha_post, p=beta_post/(beta_post+1)) in SciPy's parameterization.
n_pred = alpha_post
p_pred = beta_post / (beta_post + 1.0)
rv_pred = stats.nbinom(n_pred, p_pred)
ks, pmf, cdf = nbinom_pmf_cdf_trunc(n_pred, p_pred, q=0.999)
fig = go.Figure()
fig.add_trace(go.Bar(x=ks, y=pmf, name="Posterior predictive PMF"))
fig.update_layout(
title="Posterior predictive for next count (Gamma–Poisson → negative binomial)",
xaxis_title="k",
yaxis_title="P(Y_new=k | data)",
)
fig.show()
{
"alpha_post": float(alpha_post),
"beta_post": float(beta_post),
"predictive_n": float(n_pred),
"predictive_p": float(p_pred),
"predictive_mean": float(rv_pred.mean()),
}
{'alpha_post': 170.0,
'beta_post': 51.0,
'predictive_n': 170.0,
'predictive_p': 0.9807692307692307,
'predictive_mean': 3.3333333333333406}
# Generative modeling: heterogeneous Poisson rates
# Lambda_i ~ Gamma(shape=n, scale=(1-p)/p), then Y_i|Lambda_i ~ Poisson(Lambda_i).
# Marginally Y_i ~ NB(n, p).
n, p = 3.0, 0.4
m = 200_000
lam = rng.gamma(shape=n, scale=(1 - p) / p, size=m)
y_mix = rng.poisson(lam)
# Compare empirical histogram to the NB PMF on a truncation grid
ks, pmf, _ = nbinom_pmf_cdf_trunc(n, p, q=0.999)
counts = np.bincount(y_mix, minlength=int(ks[-1]) + 1)[: len(ks)]
pmf_hat = counts / counts.sum()
fig = go.Figure()
fig.add_trace(go.Bar(x=ks, y=pmf_hat, name="Empirical (Gamma–Poisson)", opacity=0.65))
fig.add_trace(go.Scatter(x=ks, y=pmf, mode="markers+lines", name="NB theory"))
fig.update_layout(
title="Gamma–Poisson mixture produces a negative binomial count distribution",
xaxis_title="k",
yaxis_title="Probability",
)
fig.show()
11) Pitfalls#
Parameterization confusion: some sources count successes before failures, or count trials instead of failures. Always confirm what the random variable represents.
nas real vs integer: the “wait fornsuccesses” story needs integern; for realn, interpret via the Gamma–Poisson mixture.Degenerate boundary:
p=1collapses to (K\equiv 0). Values extremely close to 0 or 1 can create numerical headaches.Numerical stability: direct factorial / binomial-coefficient computations overflow quickly. Prefer
logpmfwith gamma functions (as done here and in SciPy).Truncation: visualizations and numerical sums over (k\in{0,1,2,\dots}) require truncating the tail; check captured mass.
12) Summary#
nbinomis a discrete distribution on ({0,1,2,\dots}) modeling failures beforensuccesses.PMF: (inom{k+n-1}{k}(1-p)^k p^n) (gamma-form extends to real
n>0).Mean/variance: (\mathbb{E}[K]=n(1-p)/p), (\mathrm{Var}(K)=n(1-p)/p^2) ⇒ overdispersion vs Poisson.
Key relationship: Gamma–Poisson mixture ⇔ negative binomial.
For computation, prefer log-PMF; for simulation, the Gamma–Poisson sampler is simple and exact.